home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1993
/
Internet Info CD-ROM (Walnut Creek) (1993).iso
/
inet
/
internet-drafts
/
draft-ietf-wnils-whois-00.txt
< prev
next >
Wrap
Text File
|
1993-03-03
|
16KB
|
366 lines
WNILS Working Group Chris Weider
INTERNET-DRAFT Merit Network, Inc.
Jim Fullton
UNC Chapel Hill
Simon Spero
11/10/92 UNC Chapel Hill
Architecture of the Whois++ Index Service
Status of this memo:
The authors describe an archtecture for indexing in distributed databases,
and apply this to the WHOIS++ protocol.
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted
by other documents at any time. It is not appropriate to use
Internet Drafts as reference material or to cite them other than
as a "working draft" or "work in progress."
Please check the I-D abstract listing contained in each Internet
Draft directory to learn the current status of this or any
other Internet Draft.
This Internet Draft expires May 10, 1993.
1. Purpose:
The WHOIS++ directory service [GDS, 1992] is intended to provide
a simple, extensible directory service predicated on a template-based
information model and a flexible query language. This document describes
an architecture designed to link together many of these WHOIS++ servers
into a distributed, searchable wide area directory service.
2. Scope:
This document details a distributed, easily maintained architecture for
providing a unified index to a large number of distributed WHOIS++
servers. This architecture can be used with systems other than WHOIS++ to
provide a distributed directory service which is also searchable.
3. Motivation and Introduction:
It seems clear that with the vast amount of directory information potentially
available on the Internet, it is simply unfeasible to build a centralized
directory to serve all this information. Therefore, we should look at building
a distributed directory service. If we are to distribute the directory service,
the easiest (although not necessarily the best) way of building the directory
service is to build a hierarchy of directory information collection agents.
In this architecture, a directory query is delivered to a certain agent
in the tree, and then handed up or down, as appropriate, so that the query
is delivered to the agent which holds the information which fills the query.
This approach has been tried before, most notably in some implementations of
the X.500 standard. However, there are two major flaws with the approach
as it has been taken. This new Index Service is designed to fix these flaws.
3.1 The search problem
Current implementations of this hierarchical architecture require that a search
query issued at a certain location in the directory agent tree be replicated
to _all_ subtrees, because there is no way to tell which subtrees might
contain the desired information. It is obvious that this has rather extreme
scaling problems, and in fact the search facility has been turned off in the
X.500 architecture because of this problem. Our new WHOIS++ architecture
solves this problem by having a set of 'forward information' at each level
of the tree. That is, each level of the tree has some idea of where to look
lower in the tree to find the requested information. Consequently, the
search tree can be pruned enormously, making search feasible at all levels
of the tree. We have chosen a certain set of information to hand up the
tree as forward information; this may or may not be exactly the set of
information required to build a truly searchable directory. However, it seems
clear that without some sort of forward information, the search problem
becomes intractable.
3.2 The location problem
Current implementations of this hierarchical architecture also encode details
about the directory agent hierarchy in the location information for a specific
entry. With search turned off, this requires a user to know exactly how
the hierarchy of servers is laid out and how they are named, which leads to
acrimonious debate about the shape of the name space and really massive
headaches whenever it becomes apparant that the current namespace is unsuited
to the current usages and must be changed. The new Index Service gets around
this by a) not enforcing a true hierarchy on the directory agents, b)
dissociating the directory service from the information served, and c)
allowing new hierarchies to be built whenever necessary, without destroying
the hierarchies already in place. Thus a user does not need to know in
advance where in the hierarchy the information served is contained, and the
information a user enters to guide the search does not ever have to explicitly
show up in the hierarchy. Although there are provisions in the WHOIS++
query syntax to watch the directory service as it hand the query around, and
consequently to divine the structure of the directory service hierarchy,
it really is not relevant to the user, and does not ever have to be taken
into consideration.
3.3 The Yellow Pages problem
Current implementations of this hierarchical architecture have also been
unsuited to solving the Yellow Pages problem; that is, the problem of
easily and flexibly building special-purpose directories (say of
molecular biologists) and of automatically maintaining these directories
once they have been built. In particular with the current systems, one has
to build into the name space the attributes appropriate to the new directory.
Since our new Index Service very easily allows directory servers to pick and
choose between information proffered by a given entry server, and because we
have an architecture which allows for automatic polling of data, Yellow
Pages capabilities fall very naturally out of the design. Although the
ability to search all levels of the tree(s) gets us a long way towards the
Yellow Pages, it is this capacity to locate, gather, and maintain information
in a distributed and selective way that really solves the problem.
4. Components of the Index Service:
4.1 WHOIS++ servers
The whois++ service is described in [GDS, 1992]. As that service specifies
only the query language, the information model, and the server responses,
whois++ services can be provided by a wide variety of databases and directory
services. However, to participate in the Index Service, that underlying
database must also be able to generate a 'centroid' for the data it serves.
4.2 Centroids as forward knowledge
The centroid of a server is comprised of a list of the templates and
attributes used by that server, and a word list for each attribute.
The word list for a given attribute contains one occurrence of every
word which appears at least once in that attribute in some record in that
server's data, and nothing else.
For example, if a whois++ server contains exactly three records, as follows:
Record 1 Record 2
Template: User Template: User
First Name: John First Name: Joe
Last Name: Smith Last Name: Smith
Favourite Drink: Labatt Beer Favourite Drink: Molson Beer
Record 3
Template: Domain
Domain Name: foo.edu
Contact Name: Mike Foobar
the centroid for this server would be
Template: User
First Name: Joe
John
Last Name: Smith
Favourite Drink: Beer
Labatt
Molson
Template: Domain
Domain Name: foo.edu
Contact Name: Mike
Foobar
It is this information which is handed up the tree to provide forward knowledge.
As we mention above, this may not turn out to be the ideal solution for
forward knowledge, and we suspect that there may be a number of different
sets of forward knowledge used in the Index Service. However, the directory
architecture is in a very real sense independent of what types of forward
knowledge are handed around, and it is entirely possible to build a
unified directory which uses many types of forward knowledge.
4.3 Index servers and Index server Architecture
A whois++ index server collects and collates the centroids (or other forward
knowledge) of either a number of whois++ servers or of a number of other index
servers. An index server must be able to generate a centroid for the
information it contains.
4.3.1 Queries to index servers
An index server will take a query in standard whois++ format, search its
collections of centroids, determine which servers hold records which may fill
that query, and then forward the query to the appropriate servers.
4.3.2 Index server distribution model and centroid propogation
The diagram below illustrates how a tree of index servers is created for
a set of whois++ servers.
whois++ index index
servers servers servers
for for
_______ whois++ lower-level
| | servers index servers
| A |__
|_______| \ _______
\----------| |
_______ | D |__ ______
| | /----------|_______| \ | |
| B |__/ \----------| |
|_______| | F |
/----------|______|
/
_______ _______ /
| | | |-
| C |--------------| E |
|_______| |_______|
In the portion of the index tree shown above, whois++ servers A and B hand their
centroids up to index server D, whois++ server C hands its centroid up to
index server E, and index servers D and E hand their centroids up to index
server F.
The number of levels of index servers, and the number of index servers at each
level, will depend on the number of whois++ servers deployed, and the response
time of individual layers of the server tree. These numbers will have to
be determined in the field.
4.3.4 Centroid propogation and changes to centroids
Centroid propogation is initiated by an authenticated POLL command (sec. 4.2).
The format of the POLL command allows the poller to request the centroid of
any or all templates and attributes held by the polled server. After the
polled server has authenticated the poller, it determines which of the
requested centroids the poller is allowed to request, and then issues a
CENTROID-CHANGES report (sec. 4.3) to transmit the data. When the poller
receives the CENTROID-CHANGES report, it can authenticate the pollee to
determine whether to add the centroid changes to its data. Additionally, if
a given pollee knows what pollers hold centroids from the pollee, it can
signal to those pollers the fact that its centroid has changed by issuing
a DATA-CHANGED command. The poller can then determine if and when to
issue a new POLL request to get the updated information. The DATA-CHANGED
command is included in this protocol to allow 'interactive' updating of
critical information.
4.3.5 Query handling and passing algorithm
When an index server receives a query, it searches its collection of centroids,
and determines which servers hold records which may fill that query. As
whois++ becomes widely deployed, it is expected that some index servers
may specialize in indexing certain whois++ templates or perhaps even
certain fields within those templates. If an index server obtains a match
with the query _for those template fields and attributes the server indexes_,
it is to be considered a match for the purpose of forwarding the query.
When the index server has completed its search to match the query to a
server, it then forwards the request as shown in 5.4.
Each server in the chain can then use the authentication information
included in the FORWARDED-QUERY command to determine whether to continue
forwarding the query.
Also, a whois++ query can specify the 'trace' option, which sends to
the user a string containing the IANA handle and an identification
string for each index server the query is handed to.
5. Syntax for operations of the Index Service:
5.1 Data changed syntax
The data changed template look like this:
DATA-CHANGED:
Version-number: // version number of index service software, used to insure
// compatibility
Time-of-latest-centroid-change: // time stamp of latest centroid change, GMT
Time-of-message-generation: // time when this message was generated, GMT
Server-handle: // IANA unique identifier for this server
Best-time-to-poll: // For heavily used servers, this will identify when
// the server is likely to be lightly loaded
// so that response to the poll will be speedy, GMT
Authentication-type: // Type of authentication used by server, or NONE
Authentication-data: // data for authentication
END DATA-CHANGED // This line must be used to terminate the data changed
// message
5.2 Polling syntax
POLL:
Version-number: // version number of poller's index software, used to
// insure compatibility
Start-time: // give me all the centroid changes starting at this time, GMT
End-time: // ending at this time, GMT
Template: // a standard whois++ template name, or the keyword ALL, for a
// full update.
Field: // used to limit centroid update information to specific fields,
// is either a specific field name, a list of field names,
// or the keyword ALL
Server-handle: // IANA unique identifier for the polling server.
// this handle may optionally be cached by the polled
// server to announce future changes
Authentication-type: // Type of authentication used by poller, or NONE
Authentication-data: // Data for authentication
END POLL // This line must by used to terminate the poll message
5.3 Centroid change report
CENTROID-CHANGES:
Version-number: // version number of pollee's index software, used to
// insure compatibility
Start-time: // change list starting time, GMT
End-time: // change list ending time, GMT
Server-handle: // IANA unique identifier of the responding server
Authentication-type: // Type of authentication used by pollee, or NONE
Authentication-data: // Data for authentication
Compression-type: // Type of compression used on the data, or NONE
Size-of-compressed-data: // size of compressed data if compression is used
Operation: // One of 3 keywords: ADD, DELETE, FULL
// ADD - add these entries to the centroid for this server
// DELETE - delete these entries from the centroid of this
// server
// FULL - the full centroid as of end-time follows
Multiple occurrences of the following block of fields:
Template: // a standard whois++ template name
Field: // a field name within that template
Data: // the word list itself, one per line, cr/lf terminated
end of multiply repeated block
END CENTROID-CHANGES // This line must be used to terminate the centroid
// change report
5.4 Forwarded query
FORWARDED-QUERY:
Version-number: // version number of forwarder's index software, used to
// insure compatibility
Forwarded-From: // IANA unique identifier of the server forwarding query
Forwarded-time: // time this query forwarded, GMT (used for debugging)
Trace-option: // YES if query has 'trace' option listed, NO if not.
// used at message reception time to generate trace information
Query-origination-address: // address of origin of query
Body-of-Query: // The original query goes here
Authentication-type: // Type of authentication used by queryer
Authentication-data: // Data for authentication
END FORWARDED-QUERY // This line must be used to terminate the body of the
// query
6 Author's Addresses
Chris Weider
clw@merit.edu
Industrial Technology Institute, Pod G
2901 Hubbard Rd,
Ann Arbor, MI 48105
O: (313) 747-2730
F: (313) 747-3185
Jim Fullton
fullton@mdewey.ga.unc.edu
310 Wilson Library CB #3460
University of North Carolina
Chapel Hill, NC 27599-3460
O: (919) 962-9107
F: (919) 962-5604
Simon Spero
ses@sunsite.unc.edu
310 Wilson Library CB #3460
University of North Carolina
Chapel Hill, NC 27599-3460
O: (919) 962-9107
F: (919) 962-5604